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Abstract 



In this paper we study various properties of finite stochastic systems or 
hidden Markov chains as they are alternatively called. We discuss their 
construction following different approaches and we also derive recursive 
filtering formulas for the different systems that we consider. The key tool 
P^ , is a simple lemma on conditional expectations. 
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1 Introduction 

In this paper we consider Hidden Markov Chains (probabilistic functions of a 
Markov chain) that, hke the underlying Markov chain, take on finitely many 
values. The observed process is denoted by Y, the underlying chain by X. 
Hidden Markov chains are such that probabilities of future events of X and Y 
given the past only depend on the current state of X. Typically this means that 
X satisfies the role of a state process as it is used in stochastic system theory. 
One of the aims of the present paper is to shed some more light on the relation 
between stochastic systems and hidden Markov chains. There are two slightly 
different definitions of stochastic systems, related by a time shift of the observed 
process. We will see that a hidden Markov chain satisfies both relations. We will 
also discuss various constructions of a hidden Markov chain. These constructions 
allow different factorization and splitting properties of conditional probabilities 
of the bivariate process {X, Y) . We will also study for the different constructions 
the filtering and prediction problems and show that the solutions coincide if one 
deals with a hidden Markov chain in the way we define it. The paper is organized 
as follows. 

In section we describe the probabilistic behaviour of the joint process {X, Y) 
in more detail using the outer product of X and Y and by using properties of 
Kronecker products of matrices. 

In section S we present a somewhat different look at hidden Markov chains. It is 
shown that certain necessary properties of a hidden Markov chain are actually 
sufficient to construct one. The convenient tool is a simple lemma, that is 
presented in the appendix, on conditional expectations that involves a finitely 
generated cr-algebra. It is also shown that hidden Markov chains are nothing 
else but what in the engineering literature are called stochastic systems. In 
particular it is shown that hidden Markov chains satisfy two different notions 
of stochastic systems. It is also shown how these two notions are interrelated. 
This is done in section Q. 

In section ^ we show how various filtering and prediction formulas are simple 
consequences of the key lemma on conditional expectations of the appendix. 

2 Preliminaries 

Let (il,jr, P) be a probability space on which all the random variables to be 
encountered below are defined. Consider the following model for what we will 
call later a Hidden Markov Chain (HMC). 

Xt = AXt-i+St,Xo (1) 

Yt = HtXt (2) 

Here the state process X is modelled as a Markov process on the set E = 
{ei,... ,e„} of basis vectors of M". Moreover, this process is supposed to 
be time-homogeneous with A the matrix of one step transitions probabilities: 
Aij = P{Xt+i = ei\Xt — Cj). The process {st] is then a martingale difference 



sequence adapted to the filtration generated by X, see Q, page 17]. Throughout 
the paper we assume that each state ei is visited at least once by X. If this 
were not the case, this can always be accomplished by reducing the state space 
of X by taking basis vectors of a lower dimensional Euclidean space. 
The observation or output process Y takes its values in the set F = {/i, . . . , /„} 
of basis vectors of M™. The matrices {Ht} are assumed to form an iid sequence, 
independent of {Xj}, and each column of any of these matrices is assumed to 
be a random element of F. Clearly each Ht is the incidence matrix of a ran- 
dom map from E in F. Indeed, if Yt = ht{Xt), with the ht random maps 
from E into F, then we can write Yt = '^"^iht{ei)ljXt=ei}- So we define 
Ht^[htiei),... ,ht(en)] to get @). 

We will only need the distributions of the colums of Ht (equivalently, the 
marginal distributions of the ht{ei). These are specified by the expectation 
EHt = G. We assume (without loss of generality) the non-degeneracy condi- 
tion that none of the rows of G is zero. 

Define the filtration F ~ {J-t} by ft = cr{Xo,... ,Xt,Ho,... ,Ht}. Clearly 
both X and Y are adapted to this filtration, and so is the sequence {et} which 
is even a martingale difference sequence w.r.t F, because of the independence of 
the sequences {Xt} and {Ht}. 

In the current set up, also the joint process {{Xt,Yt)} is Markov. For complete- 
ness we give its transition probabilities, already given in pi, and derive these 
using simple properties of conditional expectations. 

Proposition 2.1 The joint process {{Xt,Yt)} is Markov with respect to F and 
the conditional transition probabilities are given by 

P{Xt = e„ Yt = f,\Tt-i) = e7diag(AXt_i)G^/, (3) 

Proof. Notice first that the indicator of the event {Xt = ei,Yt — fj} equals 
ej XtYt^ fj. Hence we can rewrite the conditional probability in equation (ra) as 
E[eJXtYt^ fj\!Ft-i]- So we compute 

E[XtYt^ \Tt-i] = E[XtXjHj\Tt-i] 

= E[E[XtXjHj\ft-i V a{Ht)]\ft-i] 

= E[E[XtXj\ft-i V <j{Ht)]Hj\ft-i] 

= E[E[di^g{Xt)\ft-i V a{Ht)]Hj\Tt-i] 

= E[di&g{AXt-i)Hj\Ft^^] 

= AiB.g[AXt-i)E[Hj \Tt~i] 
= diag(AX,_i)GT 

The result follows. D 



We will see in section ^ that it follows from proposition 2.1 that the pair (X, Y) 

forms a stochastic system in the sense of H . 

We continue with giving an alternative expression for the matrix of one step 



transition probabilities of the joint chain {X, Y) . The state space of this chain 
consists of all the nm pairs (e^, f/). These are renamed and ordered as follows: 
S(j_i)„_i-i — {ci, fj) for i e {1,... ,n} and j £ {1, ■ • • jfn}. Clearly the map 
ihj) '—>(« — l)m + j is bijective from {1, ... ,n} x {1, . . . ,to} onto {1, . . . ,nm}. 
Instead of working with {X, Y) we will use the chain Z that carries the same 
information and which is defined by Zt = vec{XtY^J). Recall that the vec- 
operator applied to a matrix results in a vector where all the columns of this 
matrix are stacked one underneath the other y, p. 30]. Then clearly the state 
space of Z is the set of basis vectors of R"™. If we call this set {zi, . . . , Znm} 
we see that {Xt,Yt) = Sk iff Zt = Zk- Notice also the following relations. 
Zt = Yt® Xt, Xt = (1^ ® In)Zt and Yt = {Im (8) 1^)^*. Here Im is the m- 
dimensional identity matrix and 1„ is the n-dimensional column vector with all 
its elements equal to one. 

According to proposition p.l| we now get that the nm x nm matrix Q of transition 
probabilities of Z can be decomposed as a matrix with tti^ blocks Qij that are 
equal to diag(Gi.)A, where Gi, is the i-th row of G. For a more compact 
formulation we introduce (like in 0) the following notation. Let A(G) be the 
nm X n matrix defined by 



A(G) 



diag(Gi.) 



diag(Gm.) 

Using the notation A(G) we can now write 
Q = A{G)A{ll ® /„) 



(4) 



In the next lemma we gather some computational results for the A-operator, 
that might be of independent interest. Other properties are described in M. 



Lemma 2.2 For any matrices Gel 
any vectors w G R", v e M™ we have 



MG 

{Im (8) diag(w;))vec(G^) 

vec{diag{w)G^ ) 

Proof. By direct calculation. 






and N £ 



(M®1^)A(G) 

A{G)w 

A{G)w 



and for 

(5) 
(6) 
(7) 

n 



The expression ()^ for Q can also be obtained through simple matrix ma- 
nipulations and by application of lemma 2.2. By definition of Q we have 



E[Zt+i\J-t] = QZt- So we compute the conditional expectation 

= veciE[Xt+,Y,l,\Tt]) 

= vec{E[Xt+,Xl,,Hj_,,\Tt]) 

= vec(diag(AXt)GT) 

== (/,„ (g) diag{AXt))vec{G^) 

= A{G)AXt 



QZt 



Here we used in the fifth equality a known result for the vec-operator of the 
product of three matrices (see [Q, page 30]) and in the sixth equality equa- 
tion d). 

If the vector pq represents the initial distribution of X , then the initial distribu- 
tion of Z is given by the vector EZq ~ vec(diag(po)G^): EZq — Evec{XoYQ^) = 
vcc{Edia.g{Xo) Hq ) — vec(diag(po)G^), since Xq and Hq are independent. No- 
tice that vec(diag(po)G^) — A{G)po, because of (^. 

Similarly one can show that A(G)7r is an invariant probability vector for Z, if 
X has an invariant probability vector n. 

It is easy to see from equation (y|) that the "factorization property''' |^ holds: 
P{Xt = e,,Yt - fj\Tt-i) = PiYt = f,\Xt = e,)P{Xt = e,\Xt-i) (8) 



The proof is straight forward from proposition 2.1 (used in the first equality 
below) : 



P{Xt ^ e,,Yt ^ fj\Tt-i) = e,'diag(AXt_i)G7: 



= iAXt-iydmg{e,)G^f, 

= {AXt-iVe^ejG^f, 

= P{Xt ^ e,\Xt-i)Gj, 

= P{Xt^e,\Xt^i)P{Yt^f,\Xt^e,). 

Using the matrix A(G) and lemma |2.2| we can also compactly rephrase the 
factorization property (||). It becomes 

E[Zt\Tt-i] = A{G)E[Xt\Xt-i],yt. (9) 



This can be verified as follows. First, using proposition 2.1 again, we rewrite 
(|)as 

ejE[XtYt'\Tt-i]fj = G,,ejE[Xt\Tt-i]. 

Since the right hand side of this equality equals fj G(\\&g{ei)E[Xt\J-t-i\, which 
is equal to fj G(\\a,g{E[Xt\J-t-i])ei, we get 

E[XtYty\Tt-i] = diag{E[Xt\Xt-i])G'^ . 



Since veci^XtY^^) — Zt and 

vec(diag(^[X,|Xt_i])G^) = (/„ ® diag(£;[Xt|Xt_i]))vec(G^), 

we use (^ to write the RHS of this last equahty as A{G)E[Xt\Xt-i], from 
which (P) foUows. 

Remark 2.3 The vaUdity of equation (^) has been seen to be a consequence of 
the special form of the transition matrix Q in (|4|). But also the converse holds. 
If (^ holds, we get at once that Z is F-Markov, if X is F-Markov. And if we 
denote the transition matrix of Z by Q and that of X by A, we automatically 



get (m back. See proposition 3.1 



As an alternative to looking at the bivariate process {X, Y) via the process Z 
as above, we study the process W, again built from X and Y and defined by 
Wt = Yt-i (S> Xt for t > 1. Along with this process we consider the filtration 
G of cr-algebras Qt := cr{-ffo, ■ ■ • , Ht-i,Xo, ■ ■ ■ , Xt}. Then W is G-adapted and 
the Qt and the Tt are related by J-t-i V (j{Xt) — Qt and Qt V <y{Ht) = Tt- 
Then by similar computations as we carried out before and by using the Markov 
property of Z we obtain the relations 

E\Wt\Tt-i\ - {I^®A)Zt-u (10) 

E\Wt\Qt-x\ = GXt-i ® AXt-i = (/,„ ® A)/^{G)Xt-i. (11) 

In particular it follows that W is G-Markov (and hence the pair [X, Y) is a 
stochastic system in the sense of H , see section 0) with transition matrix 

i?:=(/„,®A)A(G)(l^®/„). (12) 

Observe also that W has the splitting property 

E[Wt+i\Qt] = E[Yt\Qt]®E[Xt+i\Qtl (13) 

which immediately follows from (|l l|) . 

Remark 2.4 The assumption in this section that the sequence {Ht} is iid with 
EHt — G can in principle be relaxed to assuming that {Ht — G} is a martingale 
difference sequence with respect to its own filtration without changing the results 
of this section. However, this only appears to be a relaxation, in fact they are 
equivalent assumptions. Indeed, let {Ht~G} be a martingale difference sequence 
and consider kt — vec{Ht). Then fcj takes its values in the set of basis vectors of 
j^mn g,j^^ 1^^ _ vec(G)} is again a martingale difference sequence. Let e be one 
of these basis vectors. Then P{kt+i — e|fco, ... , fcj) — e^ E[kt+i\ko, ... ,kt] = 
e~^vec{G), which doesn't depend on ko, ... ,/ct, nor on time. Hence {kt} is an 
iid sequence and so is {Ht}. 
We can also replace (||) with the equivalent equation 

Yt = GXt + vt (14) 



where rj forms a martingale difference sequence with respect to {Tt}, and it 
even holds that r)t=Yt- E[Yt\a{Xt)W J^t-i] =Yt- E[Yt\gt]. The combined set 
of equations ffl) and (14) are of the form that is commonly used in (stochastic) 



systems theory. We will come back to stochastic systems in section I 

Remark 2.5 As a final remark we notice that all the properties mentioned 
above in terms of conditional expectations given the a-algebras J-'t and Qt remain 
valid if we replace the former one with a{XQ, ... , Xj, Iqj • ■ • jYt} and the latter 
one with <j{Xq, . . . , Xt, Yq, . . . , l^_i}. Hence the law of the bivariate process 
{X,Y), being a Markov chain with respect to its own filtration, is completely 
specified by the matrices A and G and the initial law of X. It follows that 
any bivariate Markov process {X,Y), that is such that the transition matrix 
Q of the associated process Z ^ Y iSi X is oi the form (Q) and that has initial 
law EZq = A{G)po where po = EXq, can be constructed as the output of the 
system (|]) and (g). 



In view of remark 2.5 above we adopt the following 



Definition 2.6 A bivariate process {X,Y) that assumes finitely many values is 
called a Hidden Markov Chain (HMC) if the process Z = Y ®X is Markov with 
respect to the filtration F = {^t} defined by Tt — oVXq^ ... , X^, Iq; • • ■ i ^t} o.^^^ 
if its matrix of transition probabilities is given by (yj. 

3 Alternative descriptions of a HMC 

There are various ways to describe some properties of a stochastic system or a 
Hidden Markov chain. We mention a few possibilities and show how these can 
be used as building stones for a HMC. 

Let X and Y be two stochastic processes taking values in the sets E and F 
respectively, like in section 0. Let Z again be the process Y iSi X. For the time 
being no further assumptions on X and Y are imposed, except that redundant 
states are excluded in the sense that each state of X is visited at least once with 
probability one and likewise for Y. 

In this section (and all subsequent ones) we assume that for all t the tr-algebra 
J^t is generated by Xq, . . . ,Xt,Yo,... ,Yt. The family {Tt} is again denoted by 
F. We also consider the process W again, with Wt — Yt-i ® Xt, adapted to 
the filtration G = {Qt}, with Qt generated by Xq, . . . , Xt, Fg, . . . , Yt_i. Notice 
again the relations 

^t - Qtya{Yt) 
Gt = Tt-iWa{Xt) 



3.1 Alternative description of Z 

We now list a set possible properties that the processes X, Y and Z may possess. 



1. The process Z is time homogeneous F- Markov with matrix Q of transi- 
tion probabiHties, so E[Zt+i\Tt] = QZt- Moreover we assume that this 
conditional expectation only depends on Xt, which implies that there ex- 
ists a matrix Q such that E[Zt+i\J^t] = QXt = 0(1^ ® In)Zt,yt. Hence 

Q = Q{\l®In). 

2. The output property holds: 

E[Yt\Gt] - E[Yt\Tt-i V a{Xt)] = E[YMXt%^t. 

If this property holds, we use the matrix G defined by E\Yt\(j{Xt)\ = GXt, 
where we also assume that G is not depending on t. G is then such that 
the columns G.i are equal to E\Yt\Xt = e^]. 

3. The extended output property holds: 

E[Zt\gt\ = E[Zt\Tt-i V a{Xt)] = ElZtWiXt)]^- (15) 

In this case we define the matrix B (assumed to be independent of t) by 

E[Zt\<7{Xt)]^BXt. 

4. The factorization property holds: There exists a matrix K G M™^" such 
that 

E[Zt\Tt-i] = ^{K)E[Xt\Tt-il'it (16) 

First we comment on the factorization property. We showed that it is valid for 
the HMC of section H. But one can always factorize E[Zt\Tt-i\ with a second 
factor i?[Xi|.Ft_i] as in (|lq), however in general the left factor is a random 
(J^t_i-measurable) diagonal matrix, see equation ( [l9| ) below. 
Denote by Pi the conditional measure on (p,,J-) given Xt = e^. Expectation 
with respect to these measures will be denoted by Ei, with the understanding 
that expectations EtU are set equal to zero, iiP{Xt = ei) = (cf. the appendix). 
Then for any sub-cr-algebra T'^ of T and any integrable random variable U we 
have from equation ( p7| ) in the appendix the relation 

E[Ul{x,=,^}\T^] - Em^]P{Xt = e,|^°). (17) 

Application of equation ( p7| ) with U = Fj^, J-'^ — Tt-\ for all i yields 

E\XtY^\Tt,^\ = diag(£;[Xt|^t_i])^^, (18) 

where E'^ is the transpose of the matrix E that has columns S^^il-^t-i]- ^PPly 
then (0) to get 

E\Zt\Tt-x\ = ^{E)E\Xt\Tt-r\. (19) 

Proposition 3.1 Properties^, H anrfQ are equivalent. Moreover the matrices 
B, G and K are related via B = A(G) and K = G. 



Proof. Trivially the output property H follows from the extended output prop- 
erty 1^ by left multiplication with /„ (g) 1^ . 

Conversely, assume that the output property holds. Then we have E[Zt\J-t-i\/ 
a{Xt)\ = E[Yt\Tt^i V a{Xt)] ® Xt = E[Yt\Xt] ® Xt ^ E[Zt\Xt], which shows 
that the extended output property holds. 

To see the relation between B and G, notice that in this case we have BXt = 
E[Zt\Xt] = E[Yt\Xt] (3Xt = GXt ^Xt = vec{XtXjG'^) = Yec{diag{X t)G^) = 
A{G)Xt. Here we used the usual relations between the vec-operator and Kro- 
necker products as well as (0) in the last equality. 

Assume that the extended output property holds. Use then reconditioning 
in (|l|) to get: E[Zt\Tt-i] = E[E\Zt\Tt-i\/ <TiXt)]\:Ft-i] = E[E[Zt\Xt]\Tt-i] = 
BE[Xt\Tt-i]. It follows from (|9|) that B = A{E), but since B is nonrandom, 
the validity of the factorization property follows. 

Conversely, assume that the factorization property || holds. Take expectations 
in ® . Then EZt = A{K)EXt. From the definition of G (in property |) we 
get EZt = EE[Yt\aiXt)] Xt] = E{GXt (g) Xt) = A{G)EXt. Since for each i 
there is a t such that the «-th component of EXt is strictly positive, it follows 
from the blockwise diagonal structure of the A-matrices that A(G') = A{K) 
and G^K. 

Next we show that the output property holds. Assume for a moment that all 
elements of EXt are positive. According to equations ( |35| ) and (|3|) we have 

E[Yt\Tt-i V a{Xt)] = y ?^^4^^l\^eJ Xt. 
L ti t 1 \ tn i_^ ejE[Xt\Tt-i] ' 

Since YteJ Xt — {Im®eJ)Zt and using the factorization property, we can rewrite 
this as 

^ {I^® eJ)A{G)E[Xt\Tt-i] -j 
^ ejE[Xt\Tt-,] 

Because (/„i ® eJ)A{G) = GcieJ , this reduces to 

^ ejE[Xt\Tt^^] ' 

which in turn is nothing else but GXt-, from which we obtain the output property. 
In the case where the vector EXt has some elements equal to zero, the above 
procedure is still valid, provided we let the summation indices run through the 
set {i : ejEXt > 0}. D 

Similar to what we found in the previous section we have 

Proposition 3.2 Assume that the factorization property U^ holds (or, equiva- 

lently in view of proposition 3.1, the output or extended output property) . Then 

the following two statements are equivalent. 

(i) Z is ¥-Markov with transition matrix Q — Q(lm ® In). 

(ii) X is V-Markov with transition matrix A 

Furthermore we have in each of these situations the relation Q = A{G)A. 



Proof, (i) => (ii): Clearly X is F-Markov with transition matrix A = (l^(g)J„)Q 
and then it follows from the factorization property that QXt-i = E[Zt\J^t-i\ = 
A{G)E[Xt\Tt-i]^ ^{G)AXt-i. 

Conversely, (ii) ^ (i) follows in a similar way. E[Zt\Tt-i] = lS.{G)E[Xt\Tt-i] = 
A{G)AXt-i = A(G)A(1^ (g) J„)Zt_i, so Z is F-Markov with transition matrix 
Q = A(GM(l^®/„). D 



Remark 3.3 The main implication of proposition |3.2| is that the proces Z is a 
Markov chain whose transition probabilities only depend on the past value of X, 
if one starts out with a F-Markov chain X and imposes that the output condition 
holds. Clearly, if X is just Markov with respect to its own filtration and if the 
factorization property is replaced with the stronger condition E[Zt\J^t-i] = 
A{K)E[Xt\Tt-i], the same conclusion follows. 

Remark 3.4 We also observe, like in section |], that the fact that Z is F- 
Markov with Q = A{G)A implies that W is G-Markov, with transition matrix 
R = {Im ^ A)A{G). One easily checks that with the present choice of the 
filtrations equations (nm and (O) remain valid, and that in particular the fac- 
torization property holds. 

3.2 Alternative description of W 

Like in subsection |3.2| , we can also list a set of desirable properties of W. Con- 
sider thereto 

1. 14^ is a time homogeneous G-Markov chain with a transition matrix R. 
Moreover, we have that conditional expectation i<^[14^(_|_]^|^(] depends only 
on Xf. This means that there is a matrix R such that R = R{1^ (8) /„). 

2. The splitting property holds: 

E[Wt\gt-i] = E[Yt-i\gt-i] ® E[Xt\gt-i]yt. (20) 



Then we have similar to proposition 3.2 



Proposition 3.5 Under the splitting property ^(\ ) there is equivalence between 

(i) W is G-Markov with transition matrix R = R-i^m In)- 

(ii) X is G-Markov with a transition matrix A. 

Moreover, in each of these cases we have the relation R — (/„ ® A)A{G). 



Proof. We omit the proof of proposition 3.5, since it is similar to that of 



proposition 3.2 



Remark 3.6 The main message of proposition 3.5 is that to have W Markov 
with transition probabilities only depending on past values of X it is sufficient to 
start with a G-Markov chain X and to assume that the splitting property ( pO| ) 
holds. 



Remark 3.7 We noticed in remark 3.4, that from the assumption that Z is 



F-Markov and the vahdity of the factorization property, one could deduce that 
W is G-Markov. Conversely, given that W is G-Markov with the transition 
matrix as in dl^) above, we can also deduce that Z is F-Markov with Q as in (Eh 
as its transition matrix (and that equation (0) holds). This also follows from 
more general considerations to be explained at the end of section H, but here we 
give an explicit calculation. 

So let M^ be a G-Markov process with transition matrix R = R{1^ ® In)- Then 
E[Wt\gt-i] = RXt-i. From this it follows that 

E[Xt\gt-i] - (i,T, ® in)RXt-i = AXt^^ 

with A = {l^®In)R. Furthermore we have E[Yt\gt] = (/m » lZ)E[Wt+i\gt] = 

{Im ® 'i^DRXt = GXt with G={Im® ll)R. 

We now compute E[Zt+i\Tt] — E[E\Yt+i\Qt+i]® Xt+i\Tt]- By the relation that 

we just showed, this becomes E[GXt+\ ® Xt+i\!Ft\ which is I^{G)E[Xt+i\J^t\- 

We have reached our goal as soon as we show that i?[Xt+i|J^t] = i?[Xt+i|5t]. 

But it is easy to see that this follows immediately from the splitting property 

(actually it is equivalent). 

Thus we showed the Markov property of Z with respect to F and found its 

transition matrix. 

Altogether we summarize our findings of this section in 

Theorem 3.8 There is equivalence between 

(a) X is ¥-Markov and the factorization property holds. 

(h) X is G-Markov and the splitting property holds. 

(c) Z is V-Markov with transition matrix Q as in 

(d) W is G-Markov with transition matrix R as in {j 

(e) {X, Y) is a hidden Markov chain. 



Proof. The equivalence of (a) and (c) is just proposition 3.2, that of (b) and 



(d) is proposition |3.5|. Equivalence of (c) and (d) is the content of remarks p.4| 



and p.7|, whereas (c) and (e) are equivalent by definition 2.6 of a hidden Markov 



chain. D 



4 Stochastic systems 

In the previous sections we restricted ourselves to time homogeneous processes, 
implying that all conditional probabilities and expectations don't depend on 
time directly. In the present section where explicit calculations are absent, this 
restriction playes no role. We introduce some notation. Given a stochastic pro- 
cess C with values in some arbitrary measurable space, we denote for all t by 
J-f the cr-algebra generated by the Cs for s <t and by T^ the cr-algebra gen- 
erated by the C,s for s > t. Many of the results in the previous sections can be 
abstractly formulated in terms of properties of stochastic systems. A stochastic 
system is a formally defined concept. The main ingredients are a state process 
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X and an output process Y (defined on a suitable probability space and taking 
values in some other spaces) and certain conditional independence relations. 
Let us therefore recall some facts on conditional independence. Two a-algebras 
Til and 7^2 are called conditionally independent given a cr-algebra Q if for 
all bounded Ti^ -measurable functions Hi [i = 1,2) the relation E[HiH2\G] = 
E[Hi\Q]E[H2\Q] holds. A convenient characterization of this is that a-algebras 
Til and TL2 are conditionally independent given cr-algebra Q if for all bounded 
Hi-measurable functions Hi the relation E[Hi\Q V H2] = E[Hi\Q] holds. 
In the literature one can find two definitions of a stochastic system, that are 
slightly different. The first one is due to Picci M, and the essential part of 
the definition is that for all t the cr-algebras T^^ V TJ_^ and T^ V T'Y are 
conditionally independent given f7{Xt). The other one is due to Van Schuppen 
[|6| in which the conditional independence relation between a-algebras becomes: 
for all t the cr-algebras J-^ ~^ \/ J-^ ^ and T^i V J-^_i are conditionally indepen- 
dent given a{Xt). Implications of the two different definitions for the filtering 
problem will be discussed in section 0. 

We will write {X, Y) e Ep if the pair of processes {X, Y) is a stochastic sys- 
tem according to [|| and {X, Y) £ E5 if it is one in the sense of ^ . Using 
this notation, we see that (X, Y) £ Sp is equivalent with saying that Z is an 
F-Markov process with transition probabilities depending on X only, and that 
{X, Y) e E5 is equivalent with saying that W is a G-Markov process with 
transition probabilities depending on X only. Notice that both for a stochastic 
system {X, Y) either in Ep or in E5 the state process is always Markov relative 
to its own filtration. 

An obvious relation between the different concepts is that {X, Y) G Ep iff 
{X,aY) G Eg, where aY is the process defined by aYt — Yt+i. Another rela- 
tion is given in the following 

Proposition 4.1 A pair {X,Y) belongs to Eg and the splitting property holds 
iff it belongs to Ep and the output property (or the factorization property) holds. 

Proof. Suppose that {X, Y) G Ep and that the output property holds. Since 
X is F-Markov, we have i?[X(+i|jF(] = i?[Art-|-i|Xt], which is Gt measurable and 
therefore equal to i?[A'(-)-i|t/t], which is equivalent to the splitting property be- 
cause of the characterization of conditional independence given at the beginning 
of this section. 
Next we show that {X, Y) also belongs to E5. We compute 

E[Wt+i\gt] = ElElWt+ilJ'tjm 

= E[Yt C?) E[Xt+i\Tt]\gt] 

= E[Yt<g>E[Xt+i\Xt]\gt] 

= E[Yt\gt] (El E[Xt+i\Xt], 

which is cr (Art)-measurable, because of the output property. 

Conversely, letting (AT, Y) G E5 we automatically get the output property, 

because E[Yt\gt] = (/„ (g> l'^)E[Wt+i\gt] = (/„ ® l1)E[Wt+i\Xt] in view of 
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{X,Y) G Eg. Assuming the conditional independence relation we obtain the 
Markov property of Z from 

E[Zt+,\Tt] = E[E[Zt+,\gt+,]\Tt] 

= E[E[Yt+i\gt+i] ® Xt+i\Tt] 

= E[E[Yt+i \Xt+i] ® Xt+i I J^t] (output property) 

= E[E[Yt+i \Xt+i] ® Xt+ilGt] (splitting property) 

= E[E[Yt+i \Xt+i] ® Xt+i \Xt] {W is G-Markov), 

which shows that {X, Y) e Ep. D 

Remark 4.2 Observe that we already encountered a computational form of 



this proposition in subsections 3.1 and 3.2 



The connection between systems in Ep and E5 and Hidden Markov chains is 
described as 

Proposition 4.3 A finite valued time homogeneous system belonging both to 
Tip and to E5 is a Hidden Markov chain and vice versa. 



Proof. If {X, Y) is a HMC, then it follows from theorem 3.S that it belongs 
to both Ep and E5. The converse statement follows in a similar way from this 
theorem. D 



5 Filtering 

In this section we give some filtering and prediction formulas. By the filtering 
problem for a system {X, Y) belonging to Ep or to E5 we mean the deter- 
mination for each t of the conditional law of Xt given Yq, . . . ,Yt. As before, 
for each t we denote by J-^ the cr-algebra generated by Yq, . . . ,Yt. Since the 
state space of X is a set of basis vectors, this conditional law is completely 
determined by the conditional expectation E[Xt\J^Y]- The prediction problem 
is to determine for each t the conditional law of Xt+i given Iq, . . . ,Yt, that 
is completely characterized by the conditional expectations E[Xt+i\TY]- We 
will use the notations E[Xt\J-Y] = Xt and E[Xt+i\!FY] = Xt+i\t- Similarly we 
write i?[y(+i|jrj^] = Yt+iit. In addition to the above one wants to have Xt and 
Xt+i\t in recursive form. We shall see below that the recursions for the cases 
{X, Y) e Ep and {X, Y) G Eg are different. 

In the book pj recursive formulae for unnormalized filters are obtained by a 
measure transformation. Here we undertake a direct approach, that leads to a 
simple recursive formula for the conditional probabilities itself. The key argu- 



ment is in all cases provided by lemma A.l 



5.1 Filter for Sp 

In this section we obtain the filter for a system in Ep, so we work with a Markov 
chain Zt = Xt <E) Yt with transition matrix Q = Q{lJn ^ In)- The matrix Q we 
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can write as 



Q 



Qr. 



(21) 



with the Qi in R"^". No further assumptions on the Qi are made. Observe 
that the Qi have the interpretation that 



QiXt = E[Xt+il[Yt+i=f,}\^t]- 
We have the following result (alternatively presented in [||) 
Theorem 5.1 The filter X is given by the recursion 



(22) 



Xt 



ilQiXt-i 



.Xt-i 



Yt 



(23) 



with the initial condition determined by the initial law of Z . The prediction 
Xf_^_iu is equal to AXt with A — X]i=i Qj ^"■'^ ^ol-i = EXq = po. For the 
prediction It+iit we have It+ii* = CXt with C = {Im (E> 1^)Q- 



Proof. We use equation (|3|) with T" = Tj , H = a{Yt+i), which is generated 
by the sets Hi = {Yt+i — fi} and U = Xt+i- Thus we obtain 

i=l i=l ^ i\ t ) 

Then we use the Markov property of Z to write 

E[Xt+llH^J'Y] = E[E[Xt+llH^Tt]\TY] = E[Q,Xt\TY] = Q^Xt■ 

Since P[H,\TY) = E[lH^Tt] = llE[Xt+^lHM] we get equation ^). 
Define now A = E" i Qi = (Im ® In)Q and C = (/„ (g) 1^)Q. Then we 
have £'[Xt+i|jrt] = AXt and i?[yt+i|.7^t] = CXt- As a consequence we get by 
reconditioning that Xt+i\t — AXt and that Yt+iit = CXf D 

We see that the filter Xt satisfies a completely recursive system, that is, Xt 
is completely determined by Xt-i and Yt. In absence of further conditions on 
the matrix Q (in particular the factorization property) there seems to be no 
complete recursion that is satisfied by Xtu-i- The reason for this is that we 
don't have the Markov property of W with respect to G, unless the factorization 
property holds, in which case the formulas above take a particular nice form. 



See subsection 5.3 



Remark 5.2 It follows from equation (p2) that the filter (E3) can alternatively 
be expressed as 

Xt = \QiXt-i, ... , Q^mXt^i] dia.g{CXt-i)-^Yt. 
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Indeed, from equation ( p2[ ) we obtain l^QiXt — P(Yt+i = fi\J-'t), hence 
-E[lt+i|jFt] is the vector with elements l^QiXt- Conditioning of this vector on 
J^^ gives that Ft+i|t is the vector with elements l^QiXt- So we can rewrite (p3[) 



as Xt = 



QiXt-i, . . . , QmXt^i diag(yf|t_i) ^Yt and the result follows. 



5.2 Filter for S5 

In this section we obtain the filter for a system in S5, so we work with a Markov 
chain Wt — Xf^Yt^i with transition matrix R — ^(l^(g)/„), where the matrix 
R can be written as 



R 



Ri 



R„ 



(24) 



for certain matrices Ri in R"^". No further assumptions on the Ri are made. 
Observe that the Ri have the interpretation that 



RiXt = E[Xt+il{Yt=fi}\Qt]- 
Then we have 
Theorem 5.3 The predictor Xf^t-i is given by the recursion 



(25) 



X, 



iI-RiXt|t_ 



ilR^Xtu^ 



Yt 



(26) 



with the initial condition ^ol-i — EXq. For the filter Xt and for ^4+114 we have 
the following relations. 



Xt+i = diag{Xt+i\t)G^ diag{Yt+i\t) ^Yt+i, 
where G — (Im ® 1^)-^ O'l^d 

Yt+i\t — GXt^f^m- 



(27) 
(28) 



Proof. We use equation ( p5| ) with JT" = J'^_i, H ~ o'(Ft), which is generated 
by the sets H^ — {Yt = fi] and U = Xt+i. Then we obtain ElXf+ilT^] = 

-1h ■ Then we use the Markov 



property of W to write 

E[Xt+ilHM_i] - E[E[Xt+llH^Qt]\Ttl] 

= E[R,Xt\Tti] 

= RiXt\t-i- 
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Since P{H,\TY-i) = E[Ih,\TY^i] = lJ,E[Xt+ilHA^r^i] we get equation (||). 
To derive the formula (p7|) for the filter we proceed similarly, using lemma 



A.l again with U — Xt+i, T^ = Tj and Ji = cr(Yt+i) generated by the 
sets Hi = {Yt+i = /i}. Then we can write equation ( p5| ) as £'[Xf+i|J^f^^] = 

£;[x,+iyTil-^r]diag(rt+i|,)-'^t+i- 

= E[Xt+iE[Y,l,\gt+,]\T^] 

= E[Xt+^{GXt+^V\Tr] 
= E[dmg{Xt+i)\J'^]G^ . 

Then equation (j2^) follows, as well as equation ([2^), since we have £'[^4+1 [JT^^] = 
i?[r,+iX7+i|^ni„ = Gdiag(X,+i|,)l„ = GX,+m. D 



Remark 5.4 By a similar argument as in remark 5.2 we can rewrite the re- 
cursion (Eq) for the predictor as 



^t+i\t 



RlXt\t-l, ■ • • , RmXt\t-l 



di^g{GXt\t-ir^Yt 



Remark 5.5 Notice that in contrast with what we got in subsection 5.1 for 
Ep here the predictor satisfies a completely recursive system, whereas we obtain 
the filter in terms of the predictor. 

The formulas above take a particular nice form if the system satisfies the split- 



ting property. See subsection 5.3 



5.3 Filter for a Hidden Markov Chain 

In this section we return to the setting of sections || and ^ and we give the recur- 
sive filtering formula for the stochastic system with the HMC Y as its output. 
Therefore, we can apply the results of subsection |5.l| with the specification that 
Q = A{G)A, so we have Qi = diag(G'i.)^ and l^Qi = G^.A. The following 
holds. 

Theorem 5.6 (i) The conditional distribution of the Xt given I07 • • • i i^t is 
recursively determined by 

Xt = dmg{AXt-i)G'^ d:i&g{GAXt-i)-^Yt, (29) 

with initial condition Xq = diag(po)G'^diag(Gpo)~^^; with po ~ EXq. 
(ii) The conditional distribution of the Xt given Yq, . . . , Yt-i is recursively de- 
termined by 

Xt+nt = Admg{Xt\t-i)G^dmg{GXt\t-ir'Yt, (30) 
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with initial condition Xo|_i = EXq = po- 

(Hi) The conditional expectation Vf+iit = i?[yt+i|J-'(^] is given by 

ft+Ht = GAdmgiXt\t-i)G^dmg{GXtit~iy'Yt (31) 



Proof, (i) Just use equation (03) and notice that 

= dmg{AXt^i)- 



l^Q^Xt-l iGAXt-l)^ 

(ii) follows from (i), since we k now from theorem ^^ that Xt+i\t = AX^ 



(iii) also follows from theorem |5.1| , upon noticing that G now becomes GA in 
view of (I). D 

Remark 5.7 Here both the filter and the predictor satisfy a complete recursive 
system. This is not surprising, because a HMC is a stochastic system belonging 



to both Ep and Ss- Notice that theorem 5.6 can alternatively be derived from 
theorem ^.31 , since under the assumptions of the present subsection we have that 
i?,: = Adiag(G,.). 

Remark 5.8 If we define for x e M" the matrix 
G^ := diag(a;)G^diag(Ga;)-\ 



then equations (|29|), (|30|) and ( |31| ) take the form Xt — Gj^j^ Yt, Xt+i\t = 
AGy Yt and Yt+ut = GAG^ Yf 

One may check that under the condition that y is a deterministic function of 
X (in which case the columns of G are basis vectors of R"*) the matrices Gx are 
right pseudo-inverses of G. 
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A A lemma on conditional expectations 

Consider some probability space {fl,!F,P) and let H he a sub-cr-algebra of T 
that is generated by a finite partition {Hi, . . . , Hk} of il, satisfying P{Hi) > 
for all i. We introduce the (conditional) probability measures Pi on (il,^) 
defined by Pi{F) = E[If p"^ A — P{F\Hi). Expectation with respect to Pi is 
denoted by Ei . Notice that for a random variable X with finite expectation we 
have 

E1h,X = P{H,)E,X- (32) 

We also have that for any sub-cr-algebra Tq and an integrable random variable 
X the equality 

E[XIh,\To\Ih, = P{H,\To)E,\X\To]. (33) 

Recall that for any integrable random variable U it holds that 

k 

^[C/|?^]=^£;,[t/]lff,. (34) 

1=1 

We extend this result in the following easy to prove lemma. It is used frequently 
in sections |[ || and ||. 

Lemma A.l Let J-^ be some sub-a-algebra of J-. Then the following equalities 
hold true. 

k 

E[u\j^yn] = ^e,[u\t^]Ih,. (35) 

i=l 

k 

E[U\T''] = ^^,[L/|^"]£;[1//J^"]. (36) 

4=1 

E[1h,U\T^] = E[1h,\T°]E,[U\T^]. (37) 
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Proof. Concerning the first equality we have to show that 

fe 

2=1 

for all F e JF°, because every set in T'^ V H can be written as a finite union 
of sets F n Hj with some F G J^° and because the RHS of (pSl) is clearly 
JF" V 7i-measurable. We develop 

E{lFnH,E^=lE^[U\T"]lHA = E{lFnH, Ej[U\T°]} 

- E,{1fE,[U\J^]}P{H,) 
= £;,{lfC/}P(iI,) 
= E{lFnH,U}. 



In these computations we used (32) in the second and fourth equality and the 
defining property of conditional expectation in the third. This proves (p5|). 
The second equality is a direct consequence of the first by conditioning on JF". 
The third equality follows from the second one by taking 1h U instead of U. D 



Remark A. 2 If we take in lemma AA T^ the trivial a-algcbra, the n (35| ) 



reduces to (|34|). If P{Hi) = for some i, then Pi is not well defined but (^) is 
still valid provided we define £'i[C/|jF°] to be zero for such an i. 



Remark A. 3 Equation (|3^) is also known as the conditional Bayes theorem, 
cf |, page 23]. 
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